On February 18, DeepSeek launched NSA. DeepSeek claims that NSA is a hardware-consistent and natively trainable sparse attention mechanism for ultra-fast long-context training and inference. With an optimized design for modern hardware, NSA speeds up inference while reducing pre-training costs without affecting performance. It performs on general benchmarks, long-context tasks, and instruction-based inference equal to or better than full attention models.